Naiive Forecasts

Week 2

Evie Zhang

Old Dominion University

Topics

  1. \(E[y]\), \(\hat{y}\)
  2. Loss Functions
  3. Lags, Leads
  4. Conditional Forecasts
  5. Time Series Components

International Air Passengers

Code
data("AirPassengers")
air <- AirPassengers; rm(AirPassengers)
range(air)
[1] 104 622
  • What would be a good forecast, and why?

  • What is the “goal” or “objective”?

International Air Passengers

Does visualizing the data change your mind?

Code
plot(density(air), xlab = "Count of Passengers", ylab = "Density", main = "")

International Air Passengers

  • Point Forecast: a best guess (\(\hat{Y}\)) for an unknown value.

  • Forecast Distribution: a distribution, or range, of possibilities and their corresponding likelihoods/probabilities.

  • Point Forecast (again): a summary statistic of a forecast distribution

    • How do we decide on the appropriate summary measure for our point forecast?
    • Loss Function

Forecast Error

\[ e = y - \hat{y} \]

Will there always be forecast error?

So long as \(y\) is not deterministic, yes!

Of course, some errors are worse than others.

\[ L(e): \text{Loss Function} \]

Loss Functions

What are some examples of Loss Functions?

Absolute Loss

\[L(e) = |e| \]

Quadratic Loss

\[L(e) = e^2 \]

What are some features (good / bad) about these Loss Functions?

Risk

Forecasts seek to minimize risk. Risk is defined as the expected loss

\[R(\hat{y}) = E[L(e)] = E[L(y - \hat{y})]\]

For quadratic loss:

\[ E[(y - \hat{y})^2] = E[y^2] - 2\hat{y}E[y] + \hat{y}^2\]

Remember, we know \(E[y]\). From here, we can set to minimize this function with respect to \(\hat{y}\).

\[E[y^2] - 2\hat{y}E[y] + \hat{y}^2 \implies 0 = 2E[y] - 2\hat{y}\]

After simplifying, we’ll have \(\hat{y} = E[y]\). So, the \(\hat{y}\) that minimizes our risk is the mean of the distribution of \(y\).

Expectations

So, if \(\hat{y}^* = E[y]\), how do we calculate \(E[y]\)?

Continuous:

  • \(\int xP(x) \,dx\)

Discrete:

  • \(\sum xP(x)\)

Empirical:

  • \(\frac{1}{n}\sum_{i = 1}^{n} x_i\)

Expectation – Weighted Coin

Suppose you flip a weighted coin with the following distribution:

\[P(y = T) = \frac{3}{7}\] \[P(y = H) = \frac{4}{7}\]

If the coin comes up tails, you are given $40. If the coin comes up heads, you owe $25.

What is the expected value of the coin flip? What is the expected “risk” (i.e. \(E[L(e)]\))?

International Air Passengers, Part 2

Let’s use our shiny new optimal forecast.

Code
plot(air, xlim = c(1949, 1963))
air2 <- ts(mean(air), start = c(1961, 1), end = c(1962, 12), freq = 12)
lines(air2, col = "tomato", lwd = 2, lty = 2)

This looks terrible, but let’s continue for now.

Forecast Interval

If our variable of interest is normally distributed, we can calculate a confidence interval for our predicted outcome.

\[ [\mu - \sigma Z_{\alpha/2}, \mu + \sigma Z_{\alpha/2}] \]

\[ [\hat{y} - \sigma Z_{\alpha/2}, \hat{y} + \sigma Z_{\alpha/2}] \]

\[ [280.3 - 120 * 1.96, 280.3 + 120 * 1.96] \]

\[ [45.1, 515.5] \]

International Air Passengers, Part 3

Code
air2 <- ts(mean(air), start = c(1961, 1), end = c(1962, 12), freq = 12)
air2_u <- air2 + (1.96*sd(air))
air2_l <- air2 - (1.96*sd(air))
plot(air, xlim = c(1949, 1963),
     ylim = c(min(air, air2_l),
              max(air, air2_u)))
lines(air2, col = "tomato", lwd = 2, lty = 2)
lines(air2_u, lty = 1, col = "tomato")
lines(air2_l, lty = 1, col = "tomato")

How to Code

Code
library("lubridate")
timez <- as_date(ymd("1949-01-01"):ymd("1962-12-01")); head(timez, 3)
timez <- floor_date(timez, "month"); head(timez, 3)
timez <- unique(timez); head(timez, 3)
air3 <- data.frame(time = timez,
                   air = c(air, rep(NA, 24)))
head(air3, 3)

reg <- lm(air ~ 1, data = air3)
air3$pred <- predict(reg, air3)
air3$pred_l <- air3$pred - (1.96*sd(air3$air, na.rm = TRUE))
air3$pred_u <- air3$pred + (1.96*sd(air3$air, na.rm = TRUE))
[1] "1949-01-01" "1949-01-02" "1949-01-03"
[1] "1949-01-01" "1949-01-01" "1949-01-01"
[1] "1949-01-01" "1949-02-01" "1949-03-01"
        time air
1 1949-01-01 112
2 1949-02-01 118
3 1949-03-01 132

How to Code

Code
plot(air3$time, air3$air,
     xlim = ymd(c("1949-01-01", "1963-01-01")),
     ylim = c(min(air3$air, air3$pred_l, na.rm = TRUE),
              max(air3$air, air3$pred_u, na.rm = TRUE)),
     type = "l", xlab = "", ylab = "Frequency"
     )

lim <- air3$time > ymd("1960-12-01")
lines(air3$time[lim], air3$pred[lim], col = "tomato", lty = 2)
lines(air3$time[lim], air3$pred_l[lim], col = "tomato")
lines(air3$time[lim], air3$pred_u[lim], col = "tomato")

Time

Everything we’ve talked about thus far has been as true for cross section as for time series data.

Let’s talk about time now.

Cross Section vs Time

  • Cross section units are often denoted with an \(i\), representing “iterator”.
    • \(Y_i\), \(X_i\)
  • Time series units are denoted with a \(t\), representing time.
    • \(Y_t\), \(X_t\)
  • Panel units are denoted with a both \(i\) and \(t\).
    • \(Y_{it}\), \(X_{it}\)

Time

Time, or rather periodicity, can come in many forms:

  • Year
  • Month
  • Day
  • Minute
  • Transaction, or an instance

This is called frequency.

Leads, Lags

A common feature of time series data is intertemporal dependency. For this, we have lags and leads.

Lags: \(y_{t-1}\), \(y_{t-2}\), \(y_{t-k}\)

Leads: \(y_{t+1}\), \(y_{t+2}\), \(y_{t+k}\)

Sample: \(\{y_{1}, y_{2}, y_{3}, ..., y_{T}\}\)

Forecast Notation

Periods:

Sample: \(\{y_{1}, y_{2}, y_{3}, ..., y_{T}\}\)

Out of Sample: \(\{y_{T+1}, y_{T+2}, ..., y_{T+h}\}\) where \(h\) is the horizon

Notation:

\(\hat{y}\) : \(y\) \(\implies\) \(\hat{y_t}\) : \(y_t\)

\(\hat{y_t}\) : \(y_t\) \(\implies\) \(\hat{y_{T+h}}\) : \(y_{T+h}\)

However, these are all unclear.

\(\hat{y_{T+h|t}}\), \(y_{T+h|t}\)

Conditioning

  1. Conditioning with relevant variables will improve forecast (i.e. reduce risk)

  2. Both point forecasts and forecast intervals are functions of conditioning variables.

Let’s call all available information \(\Omega_t\).

Conditioning Examples

Code
par(mfrow = c(1, 2))
b <- read.csv("../data/bball_allstars.csv")
plot(density(b$HEIGHT),
     xlim = c(65, 90), ylim = c(0, .12),
     main = "Height of Pro Basketball Players", xlab = paste("Mean:", round(mean(b$HEIGHT), 1)))
abline(v = mean(b$HEIGHT))

plot(density(b$HEIGHT[b$LEAGUE == "NBA"]),
     xlim = c(65, 90), ylim = c(0, .12),
     main = "Height of Pro Basketball Players", xlab = paste("Mean:", round(mean(b$HEIGHT[b$LEAGUE == "WNBA"]), 1), "|", round(mean(b$HEIGHT[b$LEAGUE == "NBA"]), 1)),
     col = "dodgerblue")
lines(density(b$HEIGHT[b$LEAGUE == "WNBA"]),
      col = "tomato")
abline(v = mean(b$HEIGHT))
abline(v = mean(b$HEIGHT[b$LEAGUE == "NBA"]),
       col = "dodgerblue", lty = 2)
abline(v = mean(b$HEIGHT[b$LEAGUE == "WNBA"]),
       col = "tomato", lty = 2)
legend("topright",
       legend = c("WNBA", "NBA"),
       col = c("tomato", "dodgerblue"),
       lty = 1, bty = "n")

Code
par(mfrow = c(1, 2))
b$letter <- substr(b$PLAYER, 1, 1) %in% letters[1:10]
plot(density(b$HEIGHT),
     xlim = c(65, 90), ylim = c(0, .12),
     main = "Height of Pro Basketball Players", xlab = paste("Mean:", round(mean(b$HEIGHT), 1)))
abline(v = mean(b$HEIGHT))

plot(density(b$HEIGHT[b$letter]),
     xlim = c(65, 90), ylim = c(0, .12),
     main = "Height of Pro Basketball Players", xlab = paste("Mean:", round(mean(b$HEIGHT[!b$letter]), 1), "|", round(mean(b$HEIGHT[b$letter]), 1)),
     col = "dodgerblue")
lines(density(b$HEIGHT[!b$letter]),
      col = "tomato")
abline(v = mean(b$HEIGHT))
abline(v = mean(b$HEIGHT[b$letter]),
       col = "dodgerblue", lty = 2)
abline(v = mean(b$HEIGHT[!b$letter]),
       col = "tomato", lty = 2)
legend("topright",
       legend = c("First Name A:J", "First Name K:Z"),
       col = c("tomato", "dodgerblue"),
       lty = 1, bty = "n")

Time Series Components

Just like with cross sectional settings, we may (or may not) observe some additional variables. However, we still need to model the Data Generating Process (DGP).

Data are not a choice, per se, but you do choose functional forms and specifications.

Once we select a model, we need to estimate the model’s parameters with data.

\[y_t = f(y_{t-1}, x_{t}, x_{t-1}, \tau_t, C_t, S_t)\]

Time Series Components

\[y_t = f(y_{t-1}, x_{t}, x_{t-1}, \tau_t, C_t, S_t)\]

\[y_t = \tau_t + C_t + S_t\]

  • \(\tau\): Trend
  • \(C\): Cycle
  • \(S\): Season

Components

Trend

  • Very long term; Smooth

Cycle

  • Repeats 2-7 years; Business Cycle

Season

  • Annual patterns

It is useful to consider these things separately (additively)

Components

Code
plot(air, xlab = "Month", ylab = "Count of Passengers")

Mean Forecast

The simplest models has an intercept, but no trend, cycle or season.

\[E[y_{t+h}|\Omega_t] = \beta_0\]

This is simple, but what type of setting would this be appropriate for?

Naiive Forecast

Code
consume <- read.csv("../data/realpersonalconsumptionexpend_pc.csv")
consume$DATE <- ymd(consume$DATE)
Code
plot(consume$DATE, consume$PCEC96_PC1,
     type = "l", lwd = 2,
     xlab = "Month",
     ylab = "Percent Change (Year over Year)",
     main = "Real Personal Consumption Expenditures (PCEC96)")
abline(h = 0, lty = 2)
abline(h = mean(consume$PCEC96_PC1),
       col = "dodgerblue", lwd = 2)

Code
plot(consume$DATE[consume$DATE < ymd("2020-01-01")],
     consume$PCEC96_PC1[consume$DATE < ymd("2020-01-01")],
     xlim = c(min(consume$DATE), max(consume$DATE)),
     type = "l",
     xlab = "Month",
     ylab = "Percent Change (Year over Year)",
     main = "Real Personal Consumption Expenditures (PCEC96)")
abline(h = 0, lty = 2)
abline(h = mean(consume$PCEC96_PC1[consume$DATE < ymd("2020-01-01")]),
       col = "dodgerblue", lwd = 2)

Code
plot(consume$DATE[consume$DATE < ymd("2020-01-01")],
     consume$PCEC96_PC1[consume$DATE < ymd("2020-01-01")],
     xlim = c(min(consume$DATE), max(consume$DATE)),
     type = "l",
     xlab = "Month",
     ylab = "Percent Change (Year over Year)",
     main = "Real Personal Consumption Expenditures (PCEC96)")
abline(h = 0, lty = 2)
m <- consume$PCEC96_PC1[consume$DATE < ymd("2020-01-01")]
segments(x0 = ymd("2020-01-01"),
         x1 = max(consume$DATE),
         y0 = mean(m),
         lty = 1, lwd = 2, col = "mediumseagreen")
segments(x0 = ymd("2020-01-01"),
         x1 = max(consume$DATE),
         y0 = mean(m) + 1.645*sd(m),
         lty = 2, lwd = 2, col = "mediumseagreen")
segments(x0 = ymd("2020-01-01"),
         x1 = max(consume$DATE),
         y0 = mean(m) - 1.645*sd(m),
         lty = 2, lwd = 2, col = "mediumseagreen")
legend("bottomright", horiz = T,
       legend = c("Point Forecast", "90% CI"),
       lty = 1:2, col = "mediumseagreen", bty = "n")

Errors vs Residuals

Forecast Errors

\(e_t = y_{t+h} - E[y_{t+h}|\Omega_t] \implies y_{t+h} = E[y_{t+h}|\Omega_t] + e_t\)

Residuals

\(\hat{e_t} = y_{t+h} - \hat{y}_{t+h} = y_{t+h} - \beta_0\)

It may be helpful to plot the residuals over time to see if there are any remaining patterns.

Next Class

  1. Forecast Variance

  2. Trend Models

    • Functional Forms
    • Intercept Breaks
    • Trend Breaks